2022/10/13

One factoral design

Research question: Does happiness differ between the Swiss regions?

Data

Hypotheses

Hypothesis 1: The respondents from the 7 regions reported different mean happiness levels.

Hypothesis 2: Respondents from Espace Mittelland reported higher mean happiness levels than Zentralschweiz.

Look at the data - numerical

Nuts2 n mean trimmed10 median sd var skew kurt
Région lémanique 570 2.91 2.91 3 0.92 0.85 0.17 3.58
Espace Mittelland 720 2.72 2.69 3 0.89 0.79 0.36 3.01
Nordwestschweiz 427 2.70 2.67 3 0.88 0.78 0.47 4.05
Zürich 513 2.76 2.72 3 0.97 0.95 0.59 3.93
Ostschweiz 433 2.73 2.68 3 0.95 0.90 0.75 4.56
Zentralschweiz 275 2.61 2.55 3 0.85 0.73 0.62 4.00
Ticino 160 2.91 2.82 3 0.98 0.95 1.16 6.14

Curran, et al. (1996) suggest that |skew| < 2 an |kurtosis| < 7 should be considered normal.

Look at the data - graphical

Box plots are excellent to display distributions.
Why are they not a good choice in case?

Look at the data - graphical

WARNING: depending on the bin size histograms can be misleading.

Look at the data - graphical

Quantile-Quantile-plots are a great way to compare the sample distribution to a theoretical distribution. Ideally, the points would match the line.

Why do we see a stair pattern?

Look at the data - graphical

add some random noise (normal [0, 0.5])

Look at the data - graphical

Warning: if there are more than 2 groups, then non-overlapping CIs don’t necessarily imply a significant difference.

Analysis - parametric

Omnibus

oneway.test(H1~Nuts2,var.equal=FALSE, data=df_1f)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  H1 and Nuts2
## F = 5.2507, num df = 6.0, denom df = 1030.9, p-value = 2.434e-05

Levine and Hullett (2002) recommend ω² or η² as effect size (estimators of explained variance by factor) for ANOVAs.

  • partial η² (used by SPSS) strongly depends on the variability of the residuals
  • η² biased e.g. when n is small or there are many levels
  • in case of multiple factors Keppel (1991) recommends partial ω²
aov(H1~Nuts2, data=df_1f) %>% effectsize::omega_squared(verbose=F) %>% toTable()
Parameter Omega2 CI CI_low CI_high
Nuts2 0.0079776 0.95 0.002278 1

Hypothesis 1: The respondents from the 7 regions reported different mean happiness levels. –> Null-Hypothesis can be rejected, but the effect is negligible

Digression: Effect size


The binning of effect sizes are just rules of thumb and somewhat arbitrary.

ω² Cohen (1992) Field (2013)
very small < 0.02 < 0.01
small < 0.13 < 0.06
medium < 0.26 < 0.14
large >= 0.26 >= 0.14


When rating the effect size, consider the customs of your (sub-)domain and, more importantly, the size of other known effects on your dependent variable.

The R package effectsize includes various rules to help with the interpretation.

effectsize::interpret_omega_squared(0.008, rules = "cohen1992")
## [1] "very small"
## (Rules: cohen1992)

Analysis - parametric

Contrasts

The typical way of testing Hypothesis 2 ( Espace Mittelland happier than Zentralschweiz) is with a linear contrast (but this is NOT the recommended way).

f1_emm <- lm(H1~Nuts2, data=df_1f) %>% emmeans::emmeans('Nuts2', data=df_1f)
emmeans::test(
  emmeans::contrast(f1_emm, list(ac1=c(0, 1, 0, 0, 0, -1, 0))),
  adjust='none')
##  contrast estimate     SE   df t.ratio p.value
##  ac1         0.114 0.0651 3091   1.753  0.0797

Note 1: This analytic contrast tests a distinct hypothesis; hence no p-adjustment is needed. Comparisons without specific hypotheses (e.g., orthogonal contrasts) would need an adjustment of the significance level (e.g., False Discovery Rate)

Note 2: This analytic contrast is 2-sided, but H2 is 1-sided -> p needs to be halved

BUT: Linear contrasts are very sensitive to variance heterogeneity. Jan & Shieh (2019) recommend Welch’s t-test instead.

Analysis - parametric

Contrasts

Perform Welch’s t-test

df_1f %>%
  filter(Nuts2 %in% c('Espace Mittelland', 'Zentralschweiz')) %>%
  t.test(H1~Nuts2, data=., alternative='greater')
## 
##  Welch Two Sample t-test
## 
## data:  H1 by Nuts2
## t = 1.866, df = 513.61, p-value = 0.03131
## alternative hypothesis: true difference in means between group Espace Mittelland and group Zentralschweiz is greater than 0
## 95 percent confidence interval:
##  0.01333942        Inf
## sample estimates:
## mean in group Espace Mittelland    mean in group Zentralschweiz 
##                        2.725000                        2.610909

Analysis - parametric

Contrasts

Get effect size

df_1f %>%
  filter(Nuts2 %in% c('Espace Mittelland', 'Zentralschweiz')) %>%
  mutate(Nuts2 = forcats::fct_drop(Nuts2)) %>%
  effsize::cohen.d(H1~Nuts2, data=.)
## 
## Cohen's d
## 
## d estimate: 0.1299925 (negligible)
## 95 percent confidence interval:
##        lower        upper 
## -0.009234415  0.269219496

Hypothesis 2: Respondents from Espace Mittelland reported higher mean happiness levels than Zentralschweiz.

–> the Null-Hypothesis can be rejected, but the effect is negligible

Degression: Effect size

Analysis - parametric

Post-Hoc Analysis

Games-Howell Modification of the Tukey Test

Works with unequal samples sizes and heterogeneity of variance.

rstatix::games_howell_test(df_1f, H1~Nuts2, conf.level = 0.95, detailed = FALSE)
## # A tibble: 21 × 8
##    .y.   group1            group2      estimate conf.…¹ conf.h…²   p.adj p.adj…³
##  * <chr> <chr>             <chr>          <dbl>   <dbl>    <dbl>   <dbl> <chr>  
##  1 H1    Région lémanique  Espace Mit… -0.189    -0.339 -0.0388  4   e-3 **     
##  2 H1    Région lémanique  Nordwestsc… -0.218    -0.389 -0.0482  3   e-3 **     
##  3 H1    Région lémanique  Zürich      -0.158    -0.328  0.0130  9.2 e-2 ns     
##  4 H1    Région lémanique  Ostschweiz  -0.182    -0.358 -0.00555 3.8 e-2 *      
##  5 H1    Région lémanique  Zentralsch… -0.303    -0.494 -0.113   6.31e-5 ****   
##  6 H1    Région lémanique  Ticino      -0.00779  -0.264  0.249   1   e+0 ns     
##  7 H1    Espace Mittelland Nordwestsc… -0.0294   -0.189  0.130   9.98e-1 ns     
##  8 H1    Espace Mittelland Zürich       0.0313   -0.129  0.191   9.97e-1 ns     
##  9 H1    Espace Mittelland Ostschweiz   0.00710  -0.159  0.173   1   e+0 ns     
## 10 H1    Espace Mittelland Zentralsch… -0.114    -0.295  0.0669  5.04e-1 ns     
## # … with 11 more rows, and abbreviated variable names ¹​conf.low, ²​conf.high,
## #   ³​p.adj.signif
## # ℹ Use `print(n = ...)` to see more rows

Analysis - parametric

Post-Hoc Analysis


Pairwise Welch t-tests with alpha adjustment

pairwise.t.test(df_1f$H1, df_1f$Nuts2, data=df_1f, pool.sd=TRUE, p.adj="fdr")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  df_1f$H1 and df_1f$Nuts2 
## 
##                   Région lémanique Espace Mittelland Nordwestschweiz Zürich 
## Espace Mittelland 0.00171          -                 -               -      
## Nordwestschweiz   0.00171          0.69941           -               -      
## Zürich            0.01678          0.69105           0.43712         -      
## Ostschweiz        0.00797          0.92449           0.69105         0.75808
## Zentralschweiz    0.00015          0.13945           0.34979         0.07963
## Ticino            0.92449          0.06290           0.04002         0.13636
##                   Ostschweiz Zentralschweiz
## Espace Mittelland -          -             
## Nordwestschweiz   -          -             
## Zürich            -          -             
## Ostschweiz        -          -             
## Zentralschweiz    0.14054    -             
## Ticino            0.08487    0.00644       
## 
## P value adjustment method: fdr

Analysis - Permutation test

Hypothesis 1

df_1f %>% ez::ezPerm(
  dv = H1,
  wid = IDNO,
  between = Nuts2,
  perms = 20, # THIS SHOULD BE 1000 AT LEAST
  parallel=TRUE
) 
##   Effect p p<.05
## 1  Nuts2 0     *

Hypothesis 2

df_1f %>%
  filter(Nuts2 %in% c('Espace Mittelland', 'Zentralschweiz')) %>%
  exactRankTests::perm.test(H1~Nuts2, data=., alternative='greater', exact=TRUE)
## 
##  2-sample Permutation Test
## 
## data:  H1 by Nuts2
## T = 1962, p-value = 0.03617
## alternative hypothesis: true mu is greater than 0

Analysis - Robust

bootstrap & trimmed means


For detailed information on robust hypothesis testing see Wilcox (2013).

Hypothesis 1

WRS2::t1waybt(
  H1 ~ Nuts2, 
  data = df_1f, 
  tr = 0.2, # trimmed to middle 80%
  nboot = 1000 # 10000 would be better
  ) 
## Call:
## WRS2::t1waybt(formula = H1 ~ Nuts2, data = df_1f, tr = 0.2, nboot = 1000)
## 
## Effective number of bootstrap samples was 1000.
## 
## Test statistic: 5.0182 
## p-value: 0 
## Variance explained: 0.026 
## Effect size: 0.161

The effect size ξ was proposed by Wilcox & Tian (2011) and is heteroscedastic generalization of Cohen’s d.

Analysis - Robust

Hypothesis 2


Yuen (1974) proposed a test statistic for a two-sample trimmed mean test which allows for the presence of unequal variances. Without trimming this is Welch’s t-test.

df_1f %>%
  filter(Nuts2 %in% c('Espace Mittelland', 'Zentralschweiz')) %>%
  mutate(Nuts2 = forcats::fct_drop(Nuts2)) %>%
  WRS2::yuenbt(H1~Nuts2, data=., tr = 0.2, nboot = 1000, side = TRUE)
## Call:
## WRS2::yuenbt(formula = H1 ~ Nuts2, data = ., tr = 0.2, nboot = 1000, 
##     side = TRUE)
## 
## Test statistic: 1.3107 (df = NA), p-value = 0.194
## 
## Trimmed mean difference:  0.07723 
## 95 percent confidence interval:
## -0.0407     0.1952

Algina, Keselman, and Penfield (2005) propose a robust version of Cohen’s d.

## $AKPeffect
## [1] 0.09988292
## 
## $AKPci
## [1] -0.02673778  0.23106973
## 
## $alpha
## [1] 0.05
## 
## $call
## WRS2::akp.effect(formula = H1 ~ Nuts2, data = .)
## 
## attr(,"class")
## [1] "AKP"

Analysis - Robust

bootstrap & trimmed means

Post-Hoc

res <- WRS2::mcppb20(
  H1 ~ Nuts2, 
  data = df_1f,
  nboot = 1000, # 10000 would be better
  )
adjust_p(res) # custom function to add FDR adjusted p
## # A tibble: 21 × 7
##    r1                r2                psihat ci.lower ci.upper `p-value`  p_adj
##    <chr>             <chr>              <dbl>    <dbl>    <dbl>     <dbl>  <dbl>
##  1 Nordwestschweiz   Espace Mittelland  0.260   0.105   0.431       0     0     
##  2 Nordwestschweiz   Zentralschweiz     0.237   0.0716  0.489       0     0     
##  3 Nordwestschweiz   Région lémanique   0.233   0.0581  0.432       0     0     
##  4 Nordwestschweiz   Zürich             0.255   0.0686  0.469       0     0     
##  5 Nordwestschweiz   Ostschweiz         0.338   0.151   0.537       0     0     
##  6 Ostschweiz        Ticino            -0.225  -0.571   0.00871     0.004 0.014 
##  7 Espace Mittelland Ticino            -0.148  -0.487   0.0509      0.018 0.054 
##  8 Zürich            Ticino            -0.142  -0.433   0.0777      0.029 0.0761
##  9 Zentralschweiz    Ticino            -0.125  -0.427   0.117       0.068 0.159 
## 10 Région lémanique  Ticino            -0.120  -0.448   0.0751      0.076 0.160 
## # … with 11 more rows
## # ℹ Use `print(n = ...)` to see more rows

Condition testing (if you insist)

Step 1: perform analysis & save residuals

df_1f %<>% mutate(res = lm(H1 ~ Nuts2, data = .)$residuals)

Step 2: test normality with Lilliefors Test (Anderson-Darling cannot deal with ties)

nortest::lillie.test(df_1f$res)
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  df_1f$res
## D = 0.15946, p-value < 2.2e-16

Step 3: test variance homogeneity with Brown-Forsythe (Levene is sensitive to normality)

lawstat::levene.test(df_1f$res, df_1f$Nuts2, location='median')
## 
##  Modified robust Brown-Forsythe Levene-type test based on the absolute
##  deviations from the median
## 
## data:  df_1f$res
## Test Statistic = 1.0914, p-value = 0.3649

2-Factor ANOVA

  • there is no Welch version or something similar :’(
  • especially problematic if factor combinations lead to varying cell counts
  • Xu et al. (2013) proposed a parametric bootstrap to deal with unequal variances. [included in twowaytests]
  • the previously used ezPerm() in ez can execute the permutation test with multiple factors (slow, just main effects)
  • the previously used WRS2 package provides bootstrapping and trimmed means (my favorite in this list)
  • if the robust method cannot deal with covariates, then regress (maybe robust) the DV on them, save the residuals and use the residuals instead of the DV in the analysis

repeated measures ANOVA (within)

  • creating Bar plots to provide “inference by eye” is tricky - should show all pairwise mean differences, instead of group means, with error bars (see Franz & Loftus (2012))
  • if you want to use the traditional parametric approach, but don’t test for variance homogeneity (if you do, it would be the Mauchly’s sphericity test), then use the Greenhouse–Geisser or the Huynh–Feldt correction.
  • the previously used ezPerm() in ez can execute the permutation test with within factors (slow, just main effects)
  • the previously used WRS2 package provides bootstrapping and trimmed means (my favorite in this list)

mixed ANOVA

  • the previously used WRS2 package provides bootstrapping and trimmed means for one between and one within factor
  • ezBoot() in ez can execute bootstrapping (no trimming) for an arbitrary number of between and within factors
  • the previously used ezPerm() in ez can execute the permutation test with mixed designs (slow, just main effects)

Homework (graded)

due till 2022-10-27

  • 2-factors:
    • Is there any Swiss region, where women/men are especially happy?
    • use the variables H1 (happy/unhappy), Nuts2 (region) and Demo1 (gender) from the MOSAiCH data set
  • mixed:
    • Can you reproduce the result regarding psychological distress in Prudenzi, et al. (2022)?
    • The data is available here.
    • Would the result change with a robust approach?

You can use the starter Notebook on GitHub.